Towards Multimodal Perception and Semantic Understanding in a Developmental Model of Speech Acquisition
نویسندگان
چکیده
Babbling is a crucial process in young infants for acquiring articulatory control. By constantly trying out motor commands and observing the consequences that they cause in the environment, they develop an understanding of their own body and learn to control their articulators in order to produce meaningful speech. In earlier works, we proposed a developmental model of speech acquisition that learns to control a 3-d vocal tract simulation for producing vowel or syllable sounds. The system self-organizes learning according to speech it perceives from its environment, so that ambient speech shapes the learning process. Here, we discuss how the proposed model could be extended to form a bridge between perception, on the one end, and semantics, on the other end. The idea is to connect acoustic perception with other perceptual modalities. As an example, we discuss how the system could integrate visual input in its learning loop. By learning associations between acoustic targets and simultaneous visual perceptions, such an enhanced model could produce speech not only in reaction to acoustic input, but also triggered by visual input. Vision, thus, could help to establish a common ground between learner and tutor for interactive articulatory learning. I. A DEVELOPMENTAL MODEL OF SPEECH ACQUISITION Infants explore in a goal-directed manner from the very beginning. Studies show that even neonates orient themselves towards more interesting targets [1] and prenatal exposure to their native language seems to influence infants’ early babbling behavior (e.g. [2]). We combined these ideas in a developmental model of speech acquisition in which we model the influence of ambient speech on the learning process [3]. We provide a set of speech sounds to the system from which it extracts the important components from a high-dimensional acoustic space representation via linear discriminant analysis. The resulting 2-d goal space forms a low-dimensional representation of speech that the system is exposed to in its environment. Through this dimension reduction, full syllables are projected onto a single point in goal space. Fig. 1 depicts such a goal space trained from ambient speech sound sets consisting of the three syllables /a/, /ba/ and /ma/. This research has been supported by the Cluster of Excellence Cognitive Interaction Technology ’CITEC’ (EXC 277) at Bielefeld University, which is funded by the German Research Foundation (DFG), and is related to the European Project CODEFROR (FP7-PIRSES-2013-612555) ́1 0 1 ́1 ́0.5 0 0.5 1 a
منابع مشابه
Towards a Contrastive Pragmatic Analysis of Congratulation Speech Act in Persian and English
This paper aims at studying the speech act of congratulation in Persian and English with regard to semantic formulas. To gather the semantic formulas related to congratulation, the researchers chose 100 movies (50 in Persian and 50 in English) as the instrument of the study. The only model of cross-cultural comparison was related to that of Elwood (2004). Therefore, we used Elwood’s model as th...
متن کاملSpeech Perception
Speech perception is the process by which listeners presented with a distribution of audible frequencies modulated in amplitude (loudness) and spectral (the frequency set) content across time turn this sound into a coherent unit of perception that is interpreted as language. Classic studies established that speech is not perceived by simplymapping sets of invariant acoustic properties onto diff...
متن کامل[Modeling developmental aspects of sensorimotor control of speech production].
BACKGROUND Detailed knowledge of the neurophysiology of speech acquisition is important for understanding the developmental aspects of speech perception and production and for understanding developmental disorders of speech perception and production. METHOD A computer implemented neural model of sensorimotor control of speech production was developed. The model is capable of demonstrating the...
متن کاملMCA-NMF: Multimodal Concept Acquisition with Non-Negative Matrix Factorization
In this paper we introduce MCA-NMF, a computational model of the acquisition of multimodal concepts by an agent grounded in its environment. More precisely our model finds patterns in multimodal sensor input that characterize associations across modalities (speech utterances, images and motion). We propose this computational model as an answer to the question of how some class of concepts can b...
متن کاملDeveloping a Semantic Similarity Judgment Test for Persian Action Verbs and Non-action Nouns in Patients With Brain Injury and Determining its Content Validity
Objective: Brain trauma evidences suggest that the two grammatical categories of noun and verb are processed in different regions of the brain due to differences in the complexity of grammatical and semantic information processing. Studies have shown that the verbs belonging to different semantic categories lead to neural activity in different areas of the brain, and action verb processing is r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017